Rmarkdown allows you to effortlessly generate documents (or even websites) that print both R codes and their outcomes (this lecture note is indeed written using Rmarkdown). This can be extremely useful when you report the analysis you conducted and its source R codes to your advisor or anyone you report to (as long as that person understands R). The full power of the Rmarkdown system is on display here.
If you would like to generate a document with R codes and their results in WORD manually, it would be a real pain to do so because you need to copy and paste all the R codes you run and the results onto WORD manually. Moreover, often times, copied R codes and results are very much likely to be badly formatted when pasting them, which means that you need to spend lots of time reformatting them. Rmarkdown is basically a system that obviates the need of repeating copying and pasting when you would like to communicate what you did (R codes) and what you found (results).
In order to use Rmarkdown, install the rmarkdown package:
install.packages(rmarkdown)
Generating a report using Rmarkdown is a two-step process:
You create an Rmarkdown file (file with .Rmd as an extension) and write regular texts and R codes mixed inside it. You use a special syntax to let the computer know which parts of the file are simple texts and which parts are R codes.
render() function)In a text editor like WORD, what you write is what you see as the final document on WORD. However, what you write on an Rmarkdown file will not be what you will see as its resulting document.
So, how can we let the computer know which parts of an Rmd file be recognized as R codes? This is done by placing R codes within a special syntax as in
```{r}
R codes
```
For example, summary(cars) is recognized as an R code below,
```{r}
summary(cars)
```
Figure below is a screen-shot of an Rmd file opened in Rstudio (upper panel). Notice that R codes summary(cars) and plot(cars) are enclosed individually by the special syntax. So, in this rmd file, R knows that it should treat them as R codes, but not regular texts. On the other hand, any texts that are not enclosed by the special syntax would be recognized as regular text.
(Fig: Sample Rmd file)
Now, by hitting the Knit button at the top of the upper panel, you can process this Rmd file to produce a document with the R codes presented and the outcomes of running them all in one document. Below is what the resulting document looks like:
(Fig: Sample document)
Comparing the two figures will give you a much clearer idea of how Rmarkdown works. First, the first 4 lines represent a YAML header. You don’t really need to know what YAML is at this point. Only thing you need to know here is that by specifying options in this header, you can do things like changing the title, output file type, the color scheme of the resulting document, etc. For example, in the sample Rmd file, title is set to be Untitled and the file type of the output document is html. For those who are interested in learning more on options that can be set in the YAML header, take a look at here.
In lines 6-8, you have regular texts in the rmd file (they are not enclosed by the special syntax). Look how they are exactly the same as what you see in the output document after the title. Rmarkdown does not try to interpret regular texts.
Now, here comes an interesting difference between the rmd and output html file. summary(cars) at line 11 is recognized as R codes because it is enclosed by the special syntax. Look how the R code chunk is interpreted and translated into in the output file. First, the exact text of the R code was printed. And then, the outcome of the evaluation of the code is placed below. Realize that you did not have to copy and paste the code or its results manually! This is the beauty of Rmarkdown.
Now, going back to the rmd file, you have another line of regular text at line 14, which appears exactly the same in the output document as expected. We then have another R code chunk in lines 16-18. Look what this chunk was translated into in the output document. It’s a plot of dataset called cars, the outcome of the evaluation of plot(cars). At this point, you should have gotten the idea. But, notice that the R code was not printed in the output document unlike the previous R code. This is because of echo=FALSE in line 16. It tells the computer to NOT print (echo) the R code in the output document. Indeed, there are lots of chunk options you can use to control how R codes are interpreted or displayed.
There are number of chunk options available. Here, I list some that you may use:
For the complete list of chunk options available, check out Yihui’s website or the Rstudio Reference Guide.
The following examples should make it clear how you can use options to control your output.
(Fig: Chunk options: echo and eval)
(Fig: Chunk options: message and warning)
By default, the echo option is set to TRUE for all the chunks. Sometimes, you do not want any of the R codes to appear on the output document. For example, if you are writing a term paper, the instructor may want to see only results, but not R codes. In such cases, it would be painful to type echo=FALSE for every single R code chunk especially when you are writing a long document with lots of R code chunks. Fortunately, you can set chunk options globally so that the chunk options are effective throughout the document. This can be done using opts_chunk$set() from the knitr package.
```{r, echo=FALSE}
# load the knitr library
library(knitr)
# set chunk options globally here
opts_chunk$set(
echo=FALSE,
message = FALSE,
warning = FALSE,
tidy=FALSE,
fig.align='center',
fig.width=5,
fig.height=4
# dev='pdf'
)
```
Here is a sample code with the above R code chunk and its output:
(Fig: Setting chunk options globally)
Note that in the Rmd file, you see an R code chunk in lines 5-20. In the chunk, it first call library(knitr). We then have opts_chunk$set() with lots of options specified inside the parenthesis. Some of them look familiar. echo=FALSE prevent the R codes from appearing in the output document. The options specified here will be applied globally for the subsequent part of the Rmd file. Note that the chunk with summary(cars) does not print summary(cars) in the output document (right panel) even though you do not have echo=FALSE in the chunk. This is because echo=FALSE option specified above is effective. However, it is possible to counter the option specified in opts_chunk$set() locally. In the third R code chunk, echo=TRUE option is added. Consequently, you do see the R codes in the output document on the right. So, you successfully negated the global option of echo=FALSE only for this particular (local) chunk.
In the course of creating a document using Rmarkdown, You are going to hit the “Knit” button numerous times when you are writing a report to check whether the final output looks fine. Now, every time you knit, all the R code chunks are evaluated including those you know work just fine. This is inefficient because R has evaluated those R code chunks before. So, if we can somehow store the results of R code chunks (caching), and then let R call up the saved results instead of re-evaluating the codes all over again, we can save lots of time. The benefit of doing so is higher when the processing time of the codes is longer. Caching can be done by adding cache==TRUE as a chunk option. By adding the option, once an R chunk is processed, its results are saved and can be reused again by R later when you compile the document again.
When any part of the R codes within a cached R code chunk is changed, R is smart enough to recognize the change and evaluate the R code chunk again. Now, sometimes, your R codes within an cached R code chunk have not changed, but the content of a dataset used in the R code chunk may have changed. In such a case, R is unable to recognize the change in the content of the dataset. To R, everything looks the same as they only look at the code texts, but not the contents of R objects. Therefore, R would call up the saved results instead of rerunning the R codes, which is not what you want.
stargazer() functionYou will be asked to present regression results numerous times for your assignments and final paper. Here, we learn how to report regression results in a nicely formatted table.
#--- load the AER package ---#
library(AER)
#--- get the HousePrices data ---#
data(HousePrices)
#--- take a look at a portion of the data ---#
head(HousePrices[,1:5])
## price lotsize bedrooms bathrooms stories
## 1 42000 5850 3 1 2
## 2 38500 4000 2 1 1
## 3 49500 3060 3 1 1
## 4 60500 6650 3 1 2
## 5 61000 6360 2 1 1
## 6 66000 4160 3 1 1
Let’s run a regression:
#--- run a regression ---#
reg <- lm(price~lotsize+bedrooms+bathrooms+stories,data=HousePrices)
#--- summary of the results ---#
summary(reg)
##
## Call:
## lm(formula = price ~ lotsize + bedrooms + bathrooms + stories,
## data = HousePrices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52758 -11546 -806 8902 85312
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4009.5500 3603.1090 -1.113 0.2663
## lotsize 5.4292 0.3692 14.703 < 2e-16 ***
## bedrooms 2824.6138 1214.8076 2.325 0.0204 *
## bathrooms 17105.1745 1734.4341 9.862 < 2e-16 ***
## stories 7634.8970 1007.9745 7.574 1.57e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18270 on 541 degrees of freedom
## Multiple R-squared: 0.5355, Adjusted R-squared: 0.5321
## F-statistic: 156 on 4 and 541 DF, p-value: < 2.2e-16
While the summary table looks okay, you can have the regression results presented in a much more professional manner using the stargazer() function.
Install the pacakge if you have not.
#--- install stargazer if you have not ---#
install.packages('stargazer')
Load the package to use it.
#--- load the package ---#
library('stargazer')
You can simply put regression results (here it is reg as an argument of the stargazer() function as in:
stargazer(reg, type = "html")
| Dependent variable: | |
| price | |
| lotsize | 5.429*** |
| (0.369) | |
| bedrooms | 2,824.614** |
| (1,214.808) | |
| bathrooms | 17,105.170*** |
| (1,734.434) | |
| stories | 7,634.897*** |
| (1,007.974) | |
| Constant | -4,009.550 |
| (3,603.109) | |
| Observations | 546 |
| R2 | 0.536 |
| Adjusted R2 | 0.532 |
| Residual Std. Error | 18,265.230 (df = 541) |
| F Statistic | 155.953*** (df = 4; 541) |
| Note: | p<0.1; p<0.05; p<0.01 |
Do not forget to add an option type="html" if your final output type is html.
This is a publication-quality regression results table. Virtually almost all economics journal report regression results in a similar format. Now, to have this nicely formatted table, you need to add results = 'asis' as a chunk option as in
```{r, results = 'asis'}
stargazer(reg, type = "html")
```
This is what it would look like int the output html file if you forget to add the chunk option as in
```{r}
stargazer(reg, type = "html")
```
##
## <table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td>price</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">lotsize</td><td>5.429<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.369)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">bedrooms</td><td>2,824.614<sup>**</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(1,214.808)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">bathrooms</td><td>17,105.170<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(1,734.434)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">stories</td><td>7,634.897<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(1,007.974)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Constant</td><td>-4,009.550</td></tr>
## <tr><td style="text-align:left"></td><td>(3,603.109)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>546</td></tr>
## <tr><td style="text-align:left">R<sup>2</sup></td><td>0.536</td></tr>
## <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.532</td></tr>
## <tr><td style="text-align:left">Residual Std. Error</td><td>18,265.230 (df = 541)</td></tr>
## <tr><td style="text-align:left">F Statistic</td><td>155.953<sup>***</sup> (df = 4; 541)</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>
Yeah, not pretty.
r in the R code chunkWhen you forget to include r in the R code chunk declaration syntax, it will be recognized as regular texts, but not an R code chunk.
```{}
summary(cars)
```
The correct R code chunk declaration syntax would have been this one.
```{r}
summary(cars)
```
Suppose you had a R code chunk like below:
```{r}
corn_data <- read.csv("corn_price.csv")
summary(corn_data)
```
When you knit to compile an Rmd file, data <- read.csv('corn_price.csv') will read the dataset named corn_price.csv and save it as data. Then summary(data) will summarized the data. Now, by default, Rstudio looks for corn_price.csv in the same folder in which the Rmd file is located. For example, suppose your working Rmd file is in /Users/tmieno2/Desktop. This means that Rstudio looks for the file named corn_price.csv in /Users/tmieno2/Desktop. If the file is not in the directory, RRtudio won’t be able to find the file to import and returns an error. Clearly, all the subsequent actions dependent on the dataset will not run.
So, if you use datasets, make sure you put them in the same directory in which your rmd file is located!!!
There are other alternatives to avoid the problem. First, even if the dataset is not in the same folder as the rmd file, you could supply the full path to the dataset. Suppose corn_price.csv is stored in /Users/tmieno2/Dropbox, rather than /Users/tmieno2/Desktop. Then the following would work:
```{r}
corn_data <- read.csv("/Users/tmieno2/Dropbox/corn_price.csv")
summary(corn_data)
```
Alternatively, you can tell Rstudio to look for a specific directory for datasets by setting a working directory. You can do so by using opts_knit$set(root.dir=directory) as in
```{r}
opts_knit$set(root.dir = "/Users/tmieno2/Dropbox")
```
It would be a good practice to put all the datasets you intend to use in the same folder and set the root directory to that folder at the beginning of the rmd file.
It is unavoidable that you get frustrated with numerous errors you encounter when compiling an Rmd file. This WILL happen to you. However, a relatively new feature of Rstudio has made it a lot easier to write an Rmd file. The feature is called R Notebooks, which displays the outcomes of R codes right under the R code chunk within the rmd file. Take a look at the figure below:
(Fig: Notebook feature of Rstudio)
Note that the outcome of \(summary(cars)\) is displayed right after its R code chunk. To have R codes evaluated and displayed, hit the green triangle at the upper right corner of the R code chunk. If you want to hide the outcomes, click on the \(X\) button at the upper right corner of the outcome panel.
If there are errors in the R codes, you will see error messages in the outcome panel. Evaluation of summary(corn) in the second R code chunk produces an error because there is no registered dataset called corn in the working R environment. So, you have Error in summary(corn) : object 'corn' not found message displayed in the outcome panel. Before, you hit the Knit button, make sure that no R code chunks produce errors.
Finally, look at the third R code chunk. You notice that a tool bar is missing at its upper right corner. Right, this chunk is not recognized as an R code chunk because you are missing \(r\) in the R code chunk syntax. This means that you should always be able to avoid this error mentioned earlier.
There are three output types available, html, Word, and pdf. To select the output type, first click on the black triangle button next to the “Knit” button, and then select your preferred option type, as shown below:
(Fig: Output types)
By far the preferred version of the three is html if you do not intend to print out the output document. html is void of the concept of page. Consequently, you do not have to worry about how you should organize texts, tables, and figures within a page (fixed amount of space). Moreover, when you have regression results to display, Word is not the best option because they will be formatted very poorly at the moment (this may change in the future).
Here is a list of some useful resources to learn Rmarkdown. Indeed, what you learn here is bare minimal. For those who are keen to advance your Rmarkdown skills further, the following list of resources are useful.